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GREATER UNDERSTANDING OF INFERENTIAL STATISTICAL METH.XS 
AND EXPERIMENTAL DESIGN WjULD ENABLE LANGUAGE TEACHERS TO 
INTERPRET OBJECTIVELY AVAILABLE RESEARCH REPORTS AND 

h«:befully wduld encourage miE exferimentatic»n. to understand 

THE experimental PROCESS, EDUCATORS MUST REALIZE THAT ANY 
EXPERIMENT IS AS SUCCESSFUL AS THE EXPERIMENTER IS IN 
DESIGNING A STUDY ISOfCRFHIC TO THE UNDERLYING MATHEMATICAL 
ASSUMPTIO^NS AND IN APPLYING STATISTICAL ANALYSIS AFFRCFRIATE 
TO THESE ASSUMPTIONS. A RAND<:»M SAMPLING IS A LOCICAL STEP IN 
ARRIVING AT THE PROBABILITY VALUES USED IN TESTING A 
HYPOTHESIS BECAUSE ITS CO>NSTANT AND INDEPENDENT PROBABILITY 
FACTORS CAN BE COPED- WITH IN SIMPLE MATHEMATICAL TERMS. 
WITHOUT AT LEAST A CURSORY ACQUAINTANCE WITH RESEARCH 
REPORTING, EXPERIMENTAL DESIGN, AND STATISTICAL ANALYSIS, 
EDUCATORS MIGHT EASILY ACCEPT THE EXPERIMENTER’S CO>NCLUSICWS 
WITHOUT ANALYZING- THE CO'J'iTENT THAT LED TO» THE CO'NCLUSIOiNS. 
NEVERTHELESS, INTELLIGENT APPRAISAL Of THE LIMITED RESEARCH 
AVAILABLE REQUIRES A TH:BC<UGH EVALUATlOiN CF ALL AREAS OF 
INTERNAL AND EXTERNAL VALIDITY. THIS ARTICLE APPEARED IN 
"HISPANIA," VOCUME 50, NUMBER 3, SEPTEMBER 1967, PAGES 
496-500. (AB) 
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LET’S LOOK AT RESEARCH 

Kenneth Chastain 
Purdue University 

Tharp and McDonald in compiling a dealing with research studies. Their find- 
bibliography of research in foreign Ian- ings were quite limited. In the period 
guage for the Review of Educational Re- from December 1933 to 1937 only thir- 
search in 1938 included only those articles teen articles qualified as reports of true re- 
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search studies,^ By 1943 the bibliography 
had grovra to sixty-six references.^ How- 
ever, most were opinion articles rather 
than reports of experimental studies. Tlie 
criterion of including only research studies 
had evidently been abandoned. Such 
seems to reflect to a great extent subse- 
luent bibliographies in the field of modem 
toreign languages. The majority of the 
articl^ appearing in the journals deal 
principally with personal opinions and 
preferences. 



Certainly the small number of tme ex- 
TCriments in the field of modem foreign 
languages is not to be desired. A wi^r 
acceptance and understanding of experi- 
mental research and design should be pro- 
moted by all those connected with lan- 
g study. An increased understanding 
of mftsrential statistical methods and ex- 
perimental design would hopefully lead 
to an increasing number of experiments 
among language teachers and at the same 
time allow them to study much more ob- 
jectively the research articles now being 
published. Stanley and Campbell writing 
in the Handbook of Research on Teaching 
state that they are “gradually coming to 
the view that experimentation within 
schools must be conducted by regular staff 
of the schools concerned, whenever pos- 
sible, especially when findings are to be 
generalized to other classroom situations.”® 
Obviously some knowledge of research 
is necessary to conduct experiments, but 
perhaps not so obviously, and of even 
greater importance, is the need for under- 
standing by all in order to study the re- 
ports of experiments being conducted. 

To gain an understanding of the ex- 
perimental process (without going into 
the differences between parametric and 
non-parametric statistics) one must under- 
stand that the basis for statistical analysis 
of any experiment is the underlying mame- 
matical principles. The research project is 
designed in such a way that the experimen- 
tal situation corresponds to its mathema- 
tical considerations. The experiment then 
is successful to the point that the experi- 
menter is successful in designing a study 
isomorphic to the underlying mathematical 
assumptions and applying the statistical 



analysis appropriate to these assumptions. 
In discussing the logic of hypothesis test- 
ing Hays describes the experimental pro- 
cess as follows: “from the hypothetical 
population distribution one obtains a the- 
oretical sampling distribution. Then the 
obtained results are compared with the 
sampling distribution probabilities. If the 
probability of samples such as the one 
obtained is high, the hypothesis is re- 
garded as tenable. On the other hand, if 
the probability of such a sample (or one 
in more extreme disagreement with what 
is expected) is quite small, then doubt is 
cast on the hypothesis,”^ The importance 
of obtaining a random sampling in hypo- 
thesis testing is explained by Winer in the 
the following paragraph: “If a sample is 
drawn in such a way that (1) all elements 
in the population have an equal and con- 
stant chance of being drawn on all draws 
and (2) all possible samples have an 
equal (or a fixed and determinable) chance 
of being drawn, the resulting sample is a 
random sample from the specified pecu- 
lation. By no means should a random 
sample be considered a haphazard, un- 

E lanned sample . . . Random samples 
ave properties which are particularly im- 
portant in statistical work. This importance 
stems from the fact that random sampling 
ensures constant and independent proba- 
bilities; the latter are relatively simple to 
handle mathematically,”® In other words, 
random samples are necessary in order to 
arrive at the probability values used in 
hypothesis testing. 

Keeping the importance of the mathema- 
tical assumptions in mind, we can now 
move toward an examination of the ex- 
perimental situation itself. Stanley and 
Campbell have listed the following criteria 
for evaluating experimental studies: 

I. Internal validity— This helps us to answer 
questions as to whether or not the true ob- 
jectives of the study are really being meas- 
ured. Were the obtained results due to the 
effects of the treatments applied in the ex- 
periment? 

A. History— Without a knowledge of what 
happens during the experiment we can 
not be sure whether the results are due 
to our treatment or to some unrelated 
variable or variables. 
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B, Maturation— The obtained results might 
have occurred in the individual's normal 
developmental process even if no experi- 
m.ent had taken place, i, e,, if no treat- 
ment had been applied. 

C, Testing— It is most important to consider 

the effect of previous testing on a set of 
second test scores, 

D, Instrumentation— Changes in the criteria 

measures such as tests or observers may 
give different results. 

E, Statistical regression— No experiment is 

adequate unless a random sample is ob- 
tained. Groups which have been selected 
from either high scores or low scores will 
tend to regress toward the average, 

F, Selection— The experimenter should be 

careful to avoid bias in obtaining his 
samples for the comparison groups, 

G, Experimental mortality— The treatments 

may effect a differential loss among mem- 
bers of the comparison groups, 

H, Selection-maturation interaction— An in- 

teraction between the selection of the 
groups and the maturation of group mem- 
bers might affect criteria scores, 

II, Eternal validity— As well as an examination 
the study itself we need to test its appli- 
cability, Is this study and group represent a- 
tive of other groups and treatment variables? 
Would the same results be obtained in 
other similar groups in the hypothesized 
population, for example, first y&ar language 
classes? 

A, Reactive or interaction effect of testing— 
Pretests may “increase or decrease respon- 
dents sensitivity to treatment and make 
results imrepresentative for the unpre- 
tested universe," Would groups not given 
the pretests obtain the same results? 

B, Interaction effects of selection biases and 
the experimental variahle—As a result of 
selection bias obtained scores may be 
radically different from what might be 
obtained from a true random sample of 
the i>opulation or group in which the 
experimenter is interested, 

C, Reactive effects of experimental arrange- 
ments— The students or groups involved 
may react to the experimental situation 
itself. The obtained results would then 
be inapplicable to groups not in the 
experiment, 

D, Multiple-treatment interference -In an 
experiment with r^eated treatments prior 
treatments may affect results.^ 

One of the most likely dangers is that 
educators not familiar with research re- 
I»rting, experimental design, and statis- 
tical analyses will merely turn to the last 
pages of the article and accept the au- 



thor’s conclusions without examining the 
content which led to those conclusions. 
Without at least a cursory understanding 
of the criteria listed above an educator 
leaves himself open to misinterpretation of 
the relatively scant number of research 
studies which have been reported. For 
example, such a comment as that by Prof, 
Stack in the Modern Language Journal of 
April, 1964, “Further negative preisposi- 
tion may be reflected by the ‘null hypo- 
thesis to be tested , . . must either be 
credited to an uninformed person or to one 
guilty of the same bias of which he had 
been criticizing another writer. Granting 
that there is a support position in statistics 
as well as a rejection position (The tra- 
ditional position for several reasons has 
been that of setting up a null or negative 
hypothesis which is then rejected), one 
must conclude that there is no basis in 
fact for such a statement. 

Let’s examine a recent, well-known 
study in the teaching of modem foreign 
languages, A PsychoUnguistic Experiment 
in Foreign-Language Teaching by Scherer 
and Wertheimer, (TTe reader should note 
here that the purpoise of the following is 
to apply the criteria previously given to 
indicate aspects of any study which may 
be questionable. The scope of this paper 
involves merely a sample application of 
criteria measures and does not allow for 
a complete evaluation.) 

Fi^t and foremost, it is questionable 
whether the mathematical assumptions 
upon which all statistical analyses rest were 
met. Before the semester began it was an- 
nounced that the University of Colorado 
had been granted government funds to test 
a new method of teaching modem foreign 
languages. This announcement together 
with the connotations accompanying it led 
to the following statements: “as soon as the 
students learned that some sections were 
being taught by an audio-lingual method, 
many of those in the control group wanted 
to change to the experimental group.” and 
“The spring-semester registration became 
somewhat confused because many students 
tried to register for experimental sections 
after having been in control sections dur- 
ing the fall semester.”® Given the above 






situation it is doubtful that a random 
sample upon which both parametric and 
non-parametric statistics depend could 
have been obtained. 

Now let’s turn to an examination of 
the treatment conditions themselves. The 
audio-lingual materials were prepared to 
correspond with the text being used by 
the traditional class. This attempt to 
equate the content raises the question of 
whether it is possible to teach an active 
command of specified content in the same 
amount of time needed to teach a passive 
command. Supposedly both classes covered 
the same material during the academic year 
Cp. 28). 

We can now turn to a consideration of 
the confounders of internal validity. Dur- 
ing a consideration of history several ques- 
tions arise. (1) The instructors of the ex- 
perimental and control groups were al- 
lowed to teach their preference. Failure to 
attempt to control the instructor variable 
could have influenced the results tremen- 
dously (p. 4). (2) “Since the same end-of- 
semester examinations were to be given to 
both groups, the laboratory was made 
available to the students of the control 
group during the last few days of the sem- 
ester, and copies of the text for the control 
group were made available to the experi- 
mental students at the same time” (p. 26). 
(3) “Since testing took place in the various 
sections from January 20 to January 28, 
there can be no doubt that some students 
in the later sections received a little in- 
formation about the tests from students in 
earlier sections (p. 28). Also, the instru- 
mentation could have affected the internal 
validity. The authors admitted that it 
would have been better if the MLA 
achievement tests had been available (p. 
27). 

This study is even more suspect with 
regard to external validity. It is doubtful 
that the results, due to unusual experi- 
mental arrangements not normal to intro- 
ductory MFL classes can be applied to 
other situations involving first year lan- 
guage students. In the experiment the 
students assembled in the evening to take 
a series of pretests (p. 25). Both the hour 
of the tests and the quantity were not 



normal and would have been likely to 
create a reactive effect on the part of the 
students. Owing to the student predisposi- 
tion toward the experimental classes, men- 
tioned previously, one can assume that 
there were interaction effects of selection 
biases and the experimental variable and 
that there were reactive effects to experi- 
mental arrangements. 

In addition to the above weaknesses one 
must include an additional one which the 
authors mention. Tests of significance, 
t-tests, on the difference between means 
were used to assess group differences, 
and in the author’s words, “Such sta- 
tistics are not, strictly speaking, fully legi- 
timate for some of the measures used, 
since means and correlation coefficients im- 
ply a particular degree of refinement of 
measurement such as interval scaling; by 
no stretch of the imagine.tion can all our 
scores be thought to meet these criteria” 
Cp. 2). 

The purpose of this article is not to 
single out one particular study for criti- 
cism, but to demonstrate the improvements 
in our studies v/hich remain to be made. It 
is disturbing to read and to hear the ready 
acceptance of the results of many studies in 
the absence of a true evaluation of the 
study itself. Too often it seems that the 
summary of results is accepted without a 
proper examination of the contents. If a 
comparison with bowling may be used, we 
are watching the pins fall without con- 
sidering the truly important aspect of the 
game, the approach. Consequently, our 

f >rogress in the teaching of modem foreign 
anguages may be as “belter skelter” as 
the fallen pins. 
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